5 research outputs found

    The shooting S-estimator for robust regression

    Full text link
    To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property

    The Influence Function of Penalized Regression Estimators

    Full text link
    To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore it can be used to compute the asymptotic variance and the mean squared error. In this paper we compute the influence function, the asymptotic variance and the mean squared error for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations nonstandard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.Comment: appears in Statistics: A Journal of Theoretical and Applied Statistics, 201

    Robust and sparse estimation in high-dimensions.

    No full text
    Classical parametric statistics commonly makes assumptions about the data (e.g. normal distribution). These assumptions are often very strong and hardly fulfilled in practice. Another problem with real data is the occurrence of gross errors (e.g typos) or the presence of subpopulations (a small number of observations that behave totally different than the rest). Such atypical observations are called contaminated observations. To be robust against contamination, robust statistics focuses on fitting the desired model to the main part of the data, while not taking suspicious observations too much into account. The usual assumption in robust statistics is that the main part of the observed data follows a specified model distribution (like in classical parametric statistics), but that a small part of the observed data comes from an arbitrary, unspecified distribution. This assumption refers to rowwise contamination. The name comes from representing observed data in a matrix where the different rows represent the different observations, and the columns the observed variables. Rowwise robust methods then detect either a whole observation (row) as outlying or not. In contrast, cellwise robust methods consider single cells of the matrix as outlying. Cellwise contamination can occur if the different variables are measured separately or obtained from different sources. In such scenarios, it seems more appropriate to allow that some variables of one observation can be treated as outliers, while other variables of the same observations are labeled as clean. This is especially interesting if the number of observations is low and the number of variables large. Treating a full observation as an outlier, even though only one cell is contaminated, would then lead to a large loss of information. Furthermore, if the amount of cellwise contamination is so high that more than half of the observations are affected, most rowwise robust methods do not give reliable results anymore. In recent years, the number of data sets containing a large number of variables is increasing rapidly. In practice, collection of observations is rather expensive. Therefore, more and more data sets contain (many) more variables than observations. Such high-dimensional data sets often cannot be analyzed with classical statistics. Least squares regression, for example, cannot be carried out because the problem is ill-posed. In this thesis, we study robustness properties of high-dimensional estimators in Chapter 1. A new robust, high-dimensional regression estimator is introduced and studied in Chapter 2. In Chapter 3, we study the robustness of a recently introduced covariance estimator. The last chapters are dedicated to cellwise robustness: A regression estimator for cellwise contamination is introduced in Chapter 4. In Chapter 5, we develop a cellwise robust scatter estimator which is especially useful for high-dimensional analysis. This estimator, we compare to other high-dimensional approaches in Chapter 6.Committee i Acknowledgments v Introduction vii 1 The Influence Function of Penalized Regression Estimators 1 1.1 Introduction 1 1.2 Functionals 3 1.3 Bias 6 1.4 The Influence Function 10 1.5 The Influence Function of the Lasso 11 1.6 The Influence Function of sparse LTS 14 1.7 Plots of Influence Function 15 1.8 Sensitivity Curves 17 1.9 Asymptotic Variance and Mean Squared Error 19 1.10 Conclusion 27 1.11 Appendix - Proofs 28 2 Robust, high-dimensional regression using sparse S- and MM estimation 37 2.1 Introduction 38 2.2 The sparse S- and MM-estimator 39 2.3 Breakdown point 41 2.4 Influence Function 42 2.5 The Algorithm 46 2.6 Simulations 50 2.7 Real data examples 54 2.8 Conclusions 56 2.9 Appendix - Standardization 59 2.10 Appendix - Same amount of shrinkage with lambda_S and lambda 60 2.11 APPENDIX - Proofs 61 3 The Finite Sample Breakdown Point of PCS 73 3.1 Introduction 73 3.2 The PCS criterion 74 3.2.1 Illustrative Example 75 3.3 Finite sample breakdown point 77 3.4 Finite sample breakdown point of PCS 79 3.5 Appendix - Affine equivariance 83 3.6 Appendix - Proof of Equation 3.15 84 4 The shooting S-estimator for cellwise robust regression 85 4.1 Introduction 85 4.2 Motivation 86 4.3 Algorithm 89 4.4 Simulations 92 4.5 Real Data 96 4.6 Conclusion 99 4.7 Appendix - Description of Variables 101 4.8 Appendix - R-code 102 5 Cellwise robust high-dimensional precision matrix estimation 109 5.1 Introduction 109 5.2 Sparse precision matrix estimation for clean data 113 5.3 Cellwise robust, sparse precision matrix estimators 114 5.3.1 Robust covariance matrix estimation based on pairwise covariances 5.3.2 Robust covariance matrix estimation based on pairwise correlations 116 5.3.3 Cellwise robust precision matrix estimation 117 5.4 Selection of the regularization parameter 119 5.5 Breakdown point 120 5.6 Simulations 125 5.7 Applications 133 5.8 Conclusions 135 5.9 Appendix - NPD algorithm 136 5.10 Appendix - OGK algorithm 137 6 Robust and sparse estimation of the inverse covariance matrix using rank correlation measures 139 6.1 Introduction 140 6.2 Estimators 141 6.2.1 Two-step Estimators 142 6.2.2 Three-step Estimators 143 6.3 Computation 145 6.4 Breakdown point 147 6.5 Simulations 150 6.6 Graphical models 152 6.7 Discussion 155 Outlook 157 List of figures 157 List of tables 160 Bibliography 162 Doctoral dissertations from the Faculty of Economics and Business 171nrpages: 171status: publishe

    The Finite Sample Breakdown Point of PCS

    No full text
    The Projection Congruent Subset (PCS) is a new method for finding multivariate outliers. PCS returns an outlyingness index which can be used to construct affine equivariant estimates of multivariate location and scatter. In this note, we derive the finite sample breakdown point of these estimators.status: publishe
    corecore